266

17

Genomics

between species, compared in pairs, can be computed as a Hamming distance (i.e.,

the number of different characteristics); for example, consider three species upper A comma upper BA, B,

and upper CC, to which 10 characteristics labelled aa to j j are assigned:

StartLayout 1st Row 1st Column Blank 2nd Column a 3rd Column b 4th Column c 5th Column d 6th Column e 7th Column f 8th Column g 9th Column h 10th Column i 11th Column j 2nd Row 1st Column upper A 2nd Column 1 3rd Column 1 4th Column 1 5th Column 1 6th Column 1 7th Column 1 8th Column 1 9th Column 0 10th Column 0 11th Column 1 3rd Row 1st Column upper B 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 7th Column 1 8th Column 1 9th Column 1 10th Column 0 11th Column 0 4th Row 1st Column upper C 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 7th Column 0 8th Column 0 9th Column 0 10th Column 1 11th Column 0 EndLayout period

a

b

c

d

e

f

g

h

i

j

A

1

1

1

1

1

1

1

0

0

1

B

0

0

0

0

0

1

1

1

0

0

C

0

0

0

0

0

0

0

0

1

0

.

(17.5)

This yields the symmetric distance matrix

StartLayout 1st Row 1st Column Blank 2nd Column upper A 3rd Column upper B 4th Column upper C 2nd Row 1st Column upper A 2nd Column 0.0 3rd Column Blank 4th Column Blank 3rd Row 1st Column upper B 2nd Column 0.7 3rd Column 0.0 4th Column Blank 4th Row 1st Column upper C 2nd Column 0.9 3rd Column 0.4 4th Column 0.0 EndLayout period

A

B

C

A

0.0

B

0.7

0.0

C

0.9

0.4

0.0

.

(17.6)

The species are then clustered; the first cluster is formed from the closest pair (viz.upper BB

andupper CC in this example) and the next cluster is formed between this pair and the species

closest to its two members (and so forth in a larger group) to yield the following tree

or dendrogram:

minus minus vertical bar StartLayout 1st Row minus minus minus minus vertical bar StartLayout 1st Row minus minus minus minus upper B 2nd Row minus minus minus minus upper C EndLayout 2nd Row minus minus minus minus minus minus minus minus upper A EndLayout period −−| −−−−| −−−−

B

−−−−

C

−−−−−−−−

A

.

(17.7)

This is the classical method; the root of the tree is the common ancestor.

An alternative method, called cladistics, 26 counts the number of transformations

necessary to go from a primitive to an evolved form. Hence, in the example, upper CC

differs by just one transformation from the putative primitive form (all zeros). Two

transformations (of characters f f and gg) create a common ancestor to upper AA and upper BB, but

it must be on a different branch from that of upper CC, which does not have evolved forms

of those two characteristics. This approach yields a different tree:

minus vertical bar StartLayout 1st Row minus minus vertical bar StartLayout 1st Row minus minus minus minus minus minus upper A 2nd Row minus upper B EndLayout 2nd Row minus upper C EndLayout period −| −−| −−−−−−

A

B

C

.

(17.8)

The principle of construction of a molecular phylogeny is to use the sequences

of the “same” genes (i.e., encoding a protein of the same function) in different

organisms as the characteristic of the species; that is, molecular phylogenies are based

on genotype rather than phenotype. In actual practice, protein sequences are typically

used, which are intermediate between genotype and phenotype. In the earliest studies

(1965–1975), cytochrome cc was a popular object, since it is found in nearly all

26 A clade is a taxonomic group comprising a single common ancestor and all its descendants

(i.e., a monophyletic group). A clade minus subclade(s) is called a paraphyletic group.